In this EDA (Exploratory Data Analysis) project I will explore a dataset of the 2016 elections’ financial contributions, while examining its structure, variable, patterns and relationships between those variables. I will start with few one-variable explorations, check their distributions and then move to find relationships between two or more variables.
The first variable will be the ‘amount’ variable, which is the money a contributor donated to one or more of the candidates and is the only vector in the downloaded dataset that is not a character, rather a numeric vector.
!Important to note that the ‘finance’ dataset uploaded here was ‘munged’ in a different file called ‘all-munge.R’. The original dataset had 19 columns, which I found some of them less interesting for the scope of this project, so I removed them.
I also, in this file (‘all-munge.R’):
Changed a few of the variable names; shortened the candidates names to have only their last name; restricted the data to only the primaries and general elections of 2016; removed all the donated amounts that had minus (-); added a column to represent the candidate party affiliation; Added a column with the gender of the contributor based on a pre-defined database that I downloaded to my computer; added a new column with the day and the year, extracted from the contributions’ date column; which, all in all, ended up as a better orgnized dataset to work with when doing the EDA.

First, lets take a look at the dataset, get familiar with the variables and ask questions about the data.

How many rows (individul contributions) and columns the dataset has?
## [1] 7353272      14

We can see above that now the data set has 7,372,235 million observations of contributions contributed, with 13 different columns that correspond to each observation. The columns are:

“cand_id” - candidate ID
“candidate” - Candidate name
“contributor” - Contributor name
“city” - Contributor city
“state” - Contributor state
“zipcode” - Contributor zipcode
“employer” - Contributor employer
“occupation” - Contributor occupation
“amount” - Amount contributed
“date” - Contribution transaction date
“election_tp” - Election type (General or Primaries)
“party” - The political party of the candidate
“gender” - The contributor’s gender

Show basic statistics of the variables

##       cand_id          candidate                 contributor     
##  P00003392:3506373   Clinton:3506373   TRUITT, ROBERTA :   1520  
##  P60007168:2042624   Sanders:2042624   BODNICK, KATIE  :   1313  
##  P80001571: 746616   Trump  : 746616   AMISIAL, WILFRID:   1078  
##  P60006111: 543405   Cruz   : 543405   PURCELL, LARRY  :    722  
##  P60005915: 246313   Carson : 246313   SMITH, DAVID    :    686  
##  P60006723:  98957   Rubio  :  98957   WILLIAMS, JAMES :    682  
##  (Other)  : 168984   (Other): 168984   (Other)         :7347271  
##             city                   state            zipcode     
##  NEW YORK     : 204204   california   :1294446   Min.   :    0  
##  LOS ANGELES  : 102524   new york     : 640831   1st Qu.:20837  
##  SAN FRANCISCO:  90577   texas        : 539992   Median :53095  
##  WASHINGTON   :  90229   florida      : 420024   Mean   :52971  
##  BROOKLYN     :  87279   washington   : 293342   3rd Qu.:89141  
##  SEATTLE      :  83549   massachusetts: 279133   Max.   :99999  
##  (Other)      :6694910   (Other)      :3885504   NA's   :390    
##                   employer                       occupation     
##  N/A                  : 995348   RETIRED              :1642509  
##  RETIRED              : 908452   NOT EMPLOYED         : 626063  
##  SELF-EMPLOYED        : 535009   INFORMATION REQUESTED: 239700  
##  NONE                 : 452810   ATTORNEY             : 199767  
##  NOT EMPLOYED         : 265417   TEACHER              : 141592  
##  INFORMATION REQUESTED: 239965   PHYSICIAN            : 111942  
##  (Other)              :3956271   (Other)              :4391699  
##      amount             date                            tran_id       
##  Min.   :      0   Min.   :2013-10-01   A4EA7F7D9338943869B5:      8  
##  1st Qu.:     15   1st Qu.:2016-03-02   AA2F3125A0DB141928EB:      8  
##  Median :     28   Median :2016-05-27   AAC874DDA3EA04584A39:      8  
##  Mean   :    127   Mean   :2016-05-19   AB37264C070244DDDBF7:      8  
##  3rd Qu.:     92   3rd Qu.:2016-09-04   SA17A.4143          :      7  
##  Max.   :4904861   Max.   :2016-12-31   A1F4C793991D1416D939:      6  
##                                         (Other)             :7353227  
##  election_tp             party            gender       
##  G2016:2607976   Democrat   :5556219   female:3703574  
##  P2016:4745296   Green      :   9033   male  :3649698  
##                  Independent:   1289                   
##                  Republican :1786731                   
##                                                        
##                                                        
## 

Many different interesting points about the data can be seen in the above table. It seems that Hillary Clinton, under the ‘candidate’ column, had the highest number of occurrences, followed by Bernie Sanders and Donald Trump. Did she also lead with the total amount of contributions and not only the number of contributions?
Other things we can see in this first glance at the dataset with the number of distributions are:
New York is the leading city with 204,204 contributions; California is the leading state with highest number of contributions (1,294,446); Retired people take the first and second places with number of contributions under the ‘occupation’ and ‘employer’ variables; The Democratic party had about 4 times more contributions than the Republican party (5,556,219 / 1,786,731); The amounts donated to all parties started from few cents and reached 4,904,861, which was made by a single contributor. I wonder who that was.
I will focus on only few of the questions and variables above in the scope of this project and drill down where there is a need to understand better the distributions and connections between the variables.

One variable exploration

Amount

Let’s see first how much money was contributed in these elections by all contributors.

## [1] 932698768

The sum of all contributions to all candidates in 2016 elections was $932,698,768, as recorded in this dataset which was downloaded from the fcc.gov website.

Plot the amount variable’s distribution


The first (left) plot seems to be a non-descriptive one, but in fact we can learn a few basic things about the ‘amount’ distribution from it. First, we can see the huge gap between the highest and lowest contributions, following the x axis amounts. Second, we can see from the first plot that most of the contributions were not very far from 0 and, for sure, not in the millions. Looking at the second plot above we can see that indeed most of the contributions were below $200, after dropping the top 3% of the contributions. The median contribution in this distribution is $28. I will split the amount donated to big and small donors on the $200 mark and check which candidates were suported by big and small contributors.

The amount distribution after taking the log10


With less outliers and variability, it is easier to look at the data and its distribution in what seems now like a normal distribution.

Candidate

Who were the candidates and which party they represented


We had 18 Republicans, 5 Democrats, 1 Independent and 1 Green, out of 25 candidates in 2016 elections. Republicans outnumbered the Democrats 3 times and 18 times the Green and Independent parties.
There are many questions that this party map of candidates brings up. First, why do the Republicans have so many more candidates than the other next big party? Is it saying that the Republicans are more open to bring different perspectives into their lines and the Democrats are more inclusive?
Another obvious point is that there were mainly two parties competing in these elections, where the small ones seemed to had a very slim chance of winning. This is not just because of the minimal representation by candidates, it is also because the respectively small amounts that were collected by those parties, compare to the two big ones, which will be demostrated later on.
The American political system has been based on two-system-party since its inception, with the Federalists and the Democratic-Republican Parties, until today with the Democratic and Republican parties. An interesting question for further investigation can be, what are the chances of third party to be counted in the American political system, and can we learn this from the available data?

Facet the number of contributions by all candidates (up to $250)


Looking at the different histograms, Hillary Clinton seems to lead with number of contributions, followed by Sanders and Trump. It is not really clear from this plot who are the next ones in decending order. It seems that it can be Rubio, Cruz, Bush or Carson. I will dive into who really received the highest number of contributions and who received the highest amount of contributions.

Compare the number of contributions per candidate with a bar plot


The bar plot says it all. Clinton lead these elsections with the number of contributions, followed by Sanders, Trump, Cruz, Carson, Rubio, Paul, Fiorina, Bush and Kasich, in this order. So, how many contributions exactly each of the top 10 candidates received?

Number of contributions and percent per candidate

## # A tibble: 25 x 3
##    candidate contributions percent
##    <chr>             <int>   <dbl>
##  1 Clinton         3506373   47.7 
##  2 Sanders         2042624   27.8 
##  3 Trump            746616   10.2 
##  4 Cruz             543405    7.39
##  5 Carson           246313    3.35
##  6 Rubio             98957    1.35
##  7 Paul              31564    0.43
##  8 Fiorina           27615    0.38
##  9 Bush              27487    0.37
## 10 Kasich            25238    0.34
## 11 Johnson           13341    0.18
## 12 Stein              9033    0.12
## 13 Walker             6552    0.09
## 14 Huckabee           6396    0.09
## 15 Christie           5786    0.08
## 16 O'Malley           5088    0.07
## 17 Graham             3725    0.05
## 18 Santorum           1677    0.02
## 19 Lessig             1344    0.02
## 20 McMullin           1289    0.02
## 21 Perry               896    0.01
## 22 Webb                790    0.01
## 23 Jindal              764    0.01
## 24 Pataki              323    0   
## 25 Gilmore              76    0

Hillary Clinton received 48% of the total contributions, followed by Bernie Sanders with 27% and then Donald Trump with only 10% of the total contributions in both the primaries and the general elections. Hillary Clinton received 4.5 times more contributions than Donald Trump, yet it did not help her to win the race.

Sum of contributions and percent per candidate

## # A tibble: 25 x 4
##    candidate contributions        sum percent
##    <chr>             <int>      <dbl>   <dbl>
##  1 Clinton         3506373 482645402.   51.8 
##  2 Trump            746616 121743709.   13.0 
##  3 Sanders         2042624  93986274.   10.1 
##  4 Cruz             543405  69484768.    7.45
##  5 Rubio             98957  39876120.    4.28
##  6 Bush              27487  32972723.    3.54
##  7 Carson           246313  28869741.    3.1 
##  8 Kasich            25238  14685195.    1.57
##  9 Christie           5786   8033999.    0.86
## 10 Fiorina           27615   6714037.    0.72
## # ... with 15 more rows

As we can see, only 8 candidate out of the 25 had more than 1% of the sum of all contributions. Hillary Clinton received 52% of the contributions, followed by Donald Trump with 13% and Bernie Sanders with 11%.

Contributors

Top 10 donors who contributed the highest total amounts

## # A tibble: 1,307,046 x 4
##    contributor                       count  average       sum
##    <chr>                             <int>    <dbl>     <dbl>
##  1 HILLARY VICTORY FUND - UNITEMIZED    14 3090797. 43271164 
##  2 SMITH, MICHAEL                      544     177.    96286.
##  3 MILLER, MICHAEL                     520     171.    88931.
##  4 BOCH, ERNIE                           1   86937.    86937.
##  5 SMITH, JAMES                        454     174.    79205.
##  6 SMITH, WILLIAM                      601     123.    73901.
##  7 SMITH, DAVID                        686     102.    69864.
##  8 WILLIAMS, DAVID                     422     165.    69525.
##  9 BROWN, MICHAEL                      362     188.    67997.
## 10 SMITH, ROBERT                       543     121.    65825.
## # ... with 1,307,036 more rows


In 2016 elections rich donors could contribute as much as $360,000. With Hillary Clinton’s campaign. That’s how it worked: Donors who were rich - and willing - could give $5,400 to the Clinton campaign, $33,400 to the Democratic National Committee and $10,000 to each of the state parties (32 with Democratic committees), about $350,000 in all. A joint fundraising committee gave the donor do it all with a single check.
On Jan. 1, the contribution limits reset for the party committees, and the Hillary Victory Fund could go back to its donors for another $350,000 in party funds.
While the maximum donation to a presidential campaign was $2,700 for the primary elections (plus another $2,700 for the general), the Hillary Victory Fund could accept much larger contributions because it was a so-called joint fundraising committee comprised of multiple committees.
So, the Hillary Victory Fund was a fake contributor, and an extreme outlier, in our data. The lack of information about the real contributors must have some kind of influence on one or more analysis of the variables looked at in this project. The HVF funneled big amounts of money for Hillary Clinton’s campaign, using the states’ committees as a legal stamp to send money way and back to reach the maximum amount per donor, leaving only 1% of the contributions to the state’s committees. As a result, we do not know from the data we have, which is the government’s official 2016 contributions database, who gave and how much they gave to Clinton, from her biggest donors. Democratic donors, knowing the funds would end up with Clinton’s campaign, wrote six-figure checks to influence the election - 100 times larger than allowed. (from investor.com)
The actual big contributors, that were masked by the HVF, like Google, Facebook, JPMorgan Chase & Co, Stanford University, US Dept of State and others, can be found here.

Number of candidates per contributor (more than 1 candidate)


35,209 people contributed to more than 1 candidate, out of 1,307,046 recorded unique contributors, which is 2.7%. We can see that as the number of candidates goes up, the number of donors goes down, which seems logical. Who were the donor who contributed to maximum number of candidates?

Contributors who donated to maximum candidates

## # A tibble: 6 x 4
## # Groups:   contributor [6]
##   contributor        city           candidates    sum
##   <chr>              <chr>               <int>  <dbl>
## 1 WILSON, KIRK       DALLAS                  9 11730.
## 2 CALABRESI, STEVEN  PROVIDENCE              8 24300 
## 3 DRUMMOND, SARA     MONTALBA                8  6700 
## 4 AGRON, DOMINICK    DINGMANS FERRY          7  4154.
## 5 FRIESS, FOSTER MR. JACKSON                 7 18900 
## 6 BRYANT, GORDON     BEAUFORT                6  2025


Wilson Kirk, from Dallas, Texas (there were couple of Wilson Kirks in this database), was the one to donate to maximum number of candidates, 9 in number. Let’s see some more information about him and his contributions with a plot.

# 1 contributor multi-candidate supporter


Wilson Kirk, in 2015, contributed first to Fiorina and Huckabee and ended with Bush and Christie, while giving Bush 3 times. He then halted his contributions until the end of November, when he gave Trump twice. I wonder, as an obvious Republican supporter, why didn’t he give to Trump throughout 2016?

Big and small donors

I will look now into Hillary Clinton’s well-known claim that her campaign relied on small donations (less than $100). I went ahead, doubled the number and splitted the data on the $200 mark (as other sources suggested), as the point that separates big and small donors.

How many Hillary’s donors were small and how many big, then?

## 
## above $200 below $200 
##     326670     136347


As we can see above, Clinton had almost 2.5 times more contributions above $200 and not as she claimed. I wonder what is the ratio for Trump and Sanders, who were her two main opponents in the two elections.

Trump’s ratio between big and small donors

## 
## above $200 below $200 
##     110867     382719


Trump had almost 3.5 times more small donors than big donors!

Sander’s ratio between big and small donors

## 
## above $200 below $200 
##     118399     103102


Sanders’ had almost the same number of small and big contributors. He had 1.1 more big donors than small ones.
Let’s see the distribution of contributions above and below $200 for all candidates in a graph.

Big and small contributors per candidates(split on $200)


It seems that every candidate received more money from ‘big donors’ than small ones in 2016’s elections, except Donald Trump. Trump by far passed the rest of the candidates with samll donors contributions. Hillary, on the other hand, was the biggest consumer of big donations, while Sanders, Cruz and Carson receive more balanced ratio of contributinos from small and big donors.
Working on the above data, I noticed that some people contributed more than once. Let’s see who they were.

Repeating contributors

## # A tibble: 6 x 6
## # Groups:   contributor [6]
##   contributor         candidate count average   sum split_200 
##   <chr>               <chr>     <int>   <dbl> <dbl> <chr>     
## 1 TRUITT, ROBERTA     Clinton    1520       1 1520  above $200
## 2 BODNICK, KATIE      Clinton    1313       4 5465. above $200
## 3 AMISIAL, WILFRID    Clinton    1078       3 3526. above $200
## 4 PURCELL, LARRY      Sanders     705       4 3138. above $200
## 5 SAUNDERS, ELIZABETH Clinton     675       6 4324. above $200
## 6 SCHWARTZ, HILARY    Clinton     622       7 4429. above $200

Wow! Some people contributed hudreds of times. Truitt Roberta, as the leader on this plot, donated 1,520 times with average of $1, and she gave to the Clinton campaign. There can be many reasons for that. It can be an automaed system that does the online contributions for a person, or an army of trolls who pump-up the number of contributinos for their candidate. An interesting question here for me is who was the candidate that had the highest number of repeating contributors? I will consider here that extreme-repeating contributors as ones who donated more than 100 times.

Candidates and repeating contributors

## # A tibble: 8 x 3
##   candidate sum_count average
##   <chr>         <int>   <dbl>
## 1 Clinton      200998    149.
## 2 Sanders       77718    137.
## 3 Cruz           8676    142.
## 4 Trump           395    132.
## 5 Johnson         243    122.
## 6 Rubio           217    108.
## 7 Fiorina         107    107 
## 8 Carson          104    104


Hilary Clinton was ahead of everyone else with more than 200K of ‘extreme contributions’, followed by Sanders with 75K. The number at the top of the bars is the average number of repeating contributors per extreme donor.

Occupation

Which occupation gave most donations?

## # A tibble: 13 x 4
##    occupation             number        sum percent
##    <chr>                   <int>      <dbl>   <dbl>
##  1 RETIRED               1642509 163191821.    22.3
##  2 NOT EMPLOYED           626063  31419941.     8.5
##  3 INFORMATION REQUESTED  239700  37880243.     3.3
##  4 ATTORNEY               199767  51658327.     2.7
##  5 TEACHER                141592   7936464.     1.9
##  6 PHYSICIAN              111942  19291577.     1.5
##  7 HOMEMAKER              108421  30022237.     1.5
##  8 PROFESSOR              102188  10125557.     1.4
##  9 CONSULTANT              86321  16469980.     1.2
## 10 ENGINEER                76261   8311212.     1  
## 11 SALES                   62750   5817429.     0.9
## 12 LAWYER                  56398  14809384.     0.8
## 13 MANAGER                 54675   6864216.     0.7


This chart above cannot tells us much since there are about 120,000 occupations that donors added to their contribution forms. The text in the field was open to insert any characters without restriction, thus many occupations were writen many times in different variations
In order to analyze this facet of the dataset, we will have to write an algorithm that searches for similar terms and combine them together.
Nevertheless, in the above chart the percent of retired donors is pretty impressive, compare to them being 14.5% of the population in 2016.
Also interesting to see here is the high percent of donors who filled ‘unemployed’ at that time. I would think unemployed people won’t have the money to donate, but they did, in their ten thousands.

Contributions by occupation



Gender

Who contributed more in those elections, men or women?


Women had a slight lead with the number of contributions.

Male and female contributions in numbers

## # A tibble: 2 x 2
##   gender contributions
##   <chr>          <int>
## 1 female       3703574
## 2 male         3649698
Women contributed 3,712,479 times and men contributed 3,661,116 time. Interesting to note here that women also voted more than men in those elections. not only contributed more. By the Center for American Women and Politics, since 1964, women voted more than men in every election.
Source: Center for American Women and Politics

Source: Center for American Women and Politics


Why did women vote or contributed more than men? Maybe it is related to the fact that there were 51% women and 49% men in the US in 2016? That is a very interesting question to study in further research about women involvemnt in political issues, which, unfortunatelly, is out of the scope of this project..

Date

Amount donated at the years and months leading to the elections

Red and blue lines, respectivly, are the Republican and Democratic primaries and the green line is the general election.

People started to donate already in 2014, but in very small numbers, as can be seen further down. Most of the donors started contributing in early 2015 and until November 2016. Some kept on giving even after the elections, but it died after January 2017. We can see a steady built-up of the amount donated leading to the highest amounts given in the months and days before the general election. There was a pick of contributions between February and June of 2016 and a drop right after. This might be the related to the Republican and Democratic primaries that took place between January 1st 2016 and Jan 15th 2016.

Here is an interactive map that I used to look for interesting patterns, followed by some insights.

Zoom in by selecting a range of dates with your mouse. To zoom out double click on the graph.

Americans do not (or do very little) donate on Saturdays and Sundays, as can be seen when zooming to a week level on the above graph. This pattern is consistant throughout the elections cycle.
Zooming in on the fartherst left side of the graph, we can see that there were contributions given as early as late 2013 (and not 2014 as assumed above). It seems that the bigger numbers of donations started kicking-in somewhere in mid-lat March 2015. Interestingly, less than a month later was the day that Hillary Clinton announced her run officially. Also, it seems that after Trump’s announcement, on June 16th 2016, there was an increase in donations as well.
Another very interesting pattern is revealed when zooming in to a month level. The sum of donations for the last day(s) of each month had a sharp increase in donations, compare to the rest of the month and can be seen as the spikes along the x axis.
Let’s take a look at the early donations and who received them.

Election type

Facet contributions by election type (General and Primaries)


Now, looking at the distribution of the donations, the voting pattern looks clearer. Donations were mostly given prior to an election. The assumption that the contributions peak we saw in the previous plot between February and June 2016 is related to the primaries, was correct.

Two variable explorations

Number of contributions by gender and party in numbers and in a plot

## # A tibble: 8 x 3
## # Groups:   gender [2]
##   gender party       num_contrib
##   <chr>  <chr>             <int>
## 1 female Democrat        3036176
## 2 male   Democrat        2520043
## 3 male   Republican      1122975
## 4 female Republican       663756
## 5 male   Green              6028
## 6 female Green              3005
## 7 male   Independent         652
## 8 female Independent         637

Contributions by party overtime


Looking at the above faceted data, the trend we saw earlier with growing contributions over time and closer to the general elections, is missing from the Republican party. There actually seems to be trend down towards the General elections, on the Republican side.

Party/Gender

Number of contributions parties received by gender


Women contributed 1.2 times more than men for the Democratic party. At the other side of the asile, the Republican men contributed 1.8 times more than women to any candidate. The Green party had even wider gap between men and women’s number of contributions. Men contributed twice as much as women to that party. The Independent party was the only one to have almost identical number of contributions from men and women. We can also see here that Democrats received the highest number of contributions. Did they also received the highest amount of contributions?

Sum of contributions parties received

## # A tibble: 4 x 2
##   party       sum_contrib
##   <chr>             <dbl>
## 1 Democrat     581598736.
## 2 Republican   349620307.
## 3 Green          1132327.
## 4 Independent     347398.


Democrats received $584M, almost twice as much as the Repubicans.

Did women contribute more times to Hillary Clinton than men?


Women donated to Clinton 1.5 times more than men, and man donated to Trump 1.7 time more than women. It seems that the gender’s role with contributions to those two candidates was pretty dominant.

Early contributions (2013-2014)


3 out of the 4 candidates who received early donations were Republicans; Cruz, Paul and Rubio. Rubio was the only one who received contributions in 2013 and most 2014. Did starting early helped Rubio? Let’s see how much money each candidate collected along the way to the elections.

Candidate/date

Accumulative sum of contributions candidates received

(active chart)


We can clearly see above that throughout the elections cycles Clinton had a very strong financial lead over her running mates. Interesting to see in the above graph the end of each line, which looks like it represents the drop-off from the race. Trump received contributions for about two months after the General elections. Why would he need more campaign contributions after he already won the race? Interestingly enough, after some online research I found that all of the candidates kept on receiving donations even ater they suspended their campaigns. So, why would any of the candidates who lost should keep on receiving donations?

State/city

A map with the sum of contributions by state and city (dots)


Click the layers icon to switch between citis and states statistics.

Looking at the map above and focusing on my area (Silicon Valley, Ca) I can clearly see how the richer cities contributed more money, regardeless of their size. Leading the state contributions are California with $160M, New York with $130M, Texas with $85M and Florida with $62M. As far as cities, Palm Beach pops up first with the size of the red dot and the dark color. Did the amount contributed from each state reflect the size of its population? Let’s take a look at it with 2 charts. In further investigation here I would add few variables, like gender and party, and try to find hints for relationships between all variables.

Is there a correlation between the state’s population and number of contributions?


We can see that there is a very strong correlation (0.935) between the number of contributions per state and the number of citizens in this state. In a further investigation I would analyze the correlation between cities and their financial contributions to the different parties.

Reflection

Exploring 2016 elections’ contributions taught me a lot of things I was not aware of, despite them being publically available and me being an avid follower of politics. I started the project with a dataset of California financial contributions, the state I live in, but found it lacking data that is available on the national level, knowing that I could always go back and drill down into the state’s data. It seemed like more of a challange to work with the national dataset and it indeed was exactly that. Choosing to work with more than 7 million rows on a laptop was at the beginning very difficult and especially time consuming, but with time I found better ways and tools to work with for a given task. For example, I experiencd issues with dplyr and knittr, so I moved to work with sqldf, which was much slower, and ended up working with the built-in r function, as with the “Percent of contributions per gender” block above. By the time I ended this project I made most of the calls in the code with dplyr. In order to improve the workflow, I also created a sample file, which I used to run on more time-consuming code blocks.

Findings

Trivial findings
  • Hillary Clinton lead all the way with the amount and number of contributions. What is not trivial, and surprising to many Americans and especially her campaign, is that the money did not buy them the White House.
  • Republican candidates outnumbered the rest of the parties.
  • The American political system is made out of 2 main parties and leaves small parties ‘out of the game’.
  • $25 was the most frequent contribution.
  • Hillary Clinton relied on ‘big donations’ than small donations.
  • Men and women contributed more or less the same in those elections in general.
  • There were hardly any contributions during the weekends compare to the week days and at the end of each month there was a spike with contributions.
  • After Trump and Hillry’s announcements more contributions few in.
  • Contributions started to come in as early as 2013
Surprising findings
  • Bernie Sanders had more contributions for the primaries than the presidential winner had throghout the entire election season.
  • Hillary Clinton who received 48% of the constributions and 51% of the sum of all the contributions, while her opponent received only 10% of the contributions and 13% of the sum of all the contributions.
  • Hillary Clinton’s campaign alegedly received $84M illegal campaign contributions from rich donors, who would not have been able to donate as much as $360K, if not this special ‘arrangement’ by the HVF and the DNC.
  • 35K people donate to 2 or more than candidate.
  • Donald Trump had 3.5 times more small donations than big ones, and was the only candidate who received more small donations than big ones. Bernie Sanders received more or less the same number of big and small contributions. I was surprised by this statistic since I was always sure that he lead the small donations’ realm with his grassroots movement.
  • Some people gave hundreds of times amounts and 3 even gave more than thousand times. Hillary Clinton lead this list with 201K repeating contributions, where Bernie Sanders followed by 78K.
  • Retired and unemployed people contributed more than everyone. Note that this facet of the analysis is not complete since the data in this column is far from being clean.

As far as exploring one, two and multi-variables, I found that it is necessary sometimes to add a two variable plot or explanation right after exploring one variable, for the sake of continuation and readability. Eample is the #1 multi-candidate contributor plot that drilled down on the list of contributors who donated to more than one candidate.

Naturally, the challanging part and the part that took the longest time was the data wrangling, which I saved in a separate file called all-munge.R. This file outputs a clean dataset with all the 2016 elections’ contributions, which saved me tons of time of running the entire script each time I closed the program (Rstudio) if when it crashed.

Data can be missleading if it is not connected to the real life events that produced the topic being investigated. For example, the clinton campaign had many contributions from many contributors coming in, represented by only one name (Hillary Victory Foundation). The ‘contributor’ HVF was an outlier that skewed the data. On another hand, to remove this ‘contributor’ from te list means to remove the sums of the donations that this HVF encompases. For example,

External datasets. In order to complete missing information on the dataset, I used data from different sources. For the US population and states information, like zipcodes, longitude and latitude, I used cencus.gov. The cities data was taken from simplemaps.com.

Removing and adding new variables. I found through this project that, on one hand, you want to minimize the length of columns for the sake of speed, and on the other hand, you find that those same variables can be meaningful farther down the analysis. I had to go back and recreate the program, adding the old variables back to the dataset.

If i’ll need to do this project again I will use the Wine dataset. this dataset has 12 numeric vectors that are straight forward for correlation analysis, compare to 1 in the 2016 Elections dataset I explored here. Nevertheless, I enjoyed very much looking at the 2016 elections data and came up with some interesting points that I was not aware of.